科学家越来越依靠Python工具使用丰富的,类似于Numpy的表达式执行可扩展的分布式内存阵列操作。但是,这些工具中的许多工具都依赖于针对抽象任务图进行了优化的动态调度程序,这些调度图通常遇到内存和网络带宽相关的瓶颈,这是由于亚最佳数据和操作员的放置决策。在消息传递接口(MPI)(例如Scalapack和Slate)上构建的工具具有更好的缩放属性,但是这些解决方案需要使用专门的知识。在这项工作中,我们提出了NUMS,这是一个数组编程库,可在基于任务的分布式系统上优化类似Numpy的表达式。这是通过称为负载模拟层次调度(LSHS)的新型调度程序来实现的。 LSHS是一种本地搜索方法,可通过最大程度地减少分布式系统中任何给定节点上的最大内存和网络加载来优化操作员放置。再加上用于负载平衡数据布局的启发式,我们的方法能够在某些常见的数值操作上达到通信下限,我们的经验研究表明,LSHS通过减少2倍的降低2倍来增强RAR上的性能,需要减少4倍的内存, ,在逻辑回归问题上减少10倍的执行时间。在Terabyte尺度数据上,NUMS在DGEMM上实现了竞争性能,与Dask ML和Spark的Mllib相比,在键盘分解的密钥操作中,DASK高达20倍的速度以及logistic回归的2倍加速。
translated by 谷歌翻译
Over the past decade, there has been a significant increase in the use of Unmanned Aerial Vehicles (UAVs) to support a wide variety of missions, such as remote surveillance, vehicle tracking, and object detection. For problems involving processing of areas larger than a single image, the mosaicking of UAV imagery is a necessary step. Real-time image mosaicking is used for missions that requires fast response like search and rescue missions. It typically requires information from additional sensors, such as Global Position System (GPS) and Inertial Measurement Unit (IMU), to facilitate direct orientation, or 3D reconstruction approaches to recover the camera poses. This paper proposes a UAV-based system for real-time creation of incremental mosaics which does not require either direct or indirect camera parameters such as orientation information. Inspired by previous approaches, in the mosaicking process, feature extraction from images, matching of similar key points between images, finding homography matrix to warp and align images, and blending images to obtain mosaics better looking, plays important roles in the achievement of the high quality result. Edge detection is used in the blending step as a novel approach. Experimental results show that real-time incremental image mosaicking process can be completed satisfactorily and without need for any additional camera parameters.
translated by 谷歌翻译
PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes performance bounds for bandit problems and an experimental comparison of these bounds. Our experimental comparison has revealed that available PAC-Bayes upper bounds on the cumulative regret are loose, whereas available PAC-Bayes lower bounds on the expected reward can be surprisingly tight. We found that an offline contextual bandit algorithm that learns a policy by optimising a PAC-Bayes bound was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees.
translated by 谷歌翻译
Machine learning has been widely used in healthcare applications to approximate complex models, for clinical diagnosis, prognosis, and treatment. As deep learning has the outstanding ability to extract information from time series, its true capabilities on sparse, irregularly sampled, multivariate, and imbalanced physiological data are not yet fully explored. In this paper, we systematically examine the performance of machine learning models for the clinical prediction task based on the EHR, especially physiological time series. We choose Physionet 2019 challenge public dataset to predict Sepsis outcomes in ICU units. Ten baseline machine learning models are compared, including 3 deep learning methods and 7 non-deep learning methods, commonly used in the clinical prediction domain. Nine evaluation metrics with specific clinical implications are used to assess the performance of models. Besides, we sub-sample training dataset sizes and use learning curve fit to investigate the impact of the training dataset size on the performance of the machine learning models. We also propose the general pre-processing method for the physiology time-series data and use Dice Loss to deal with the dataset imbalanced problem. The results show that deep learning indeed outperforms non-deep learning, but with certain conditions: firstly, evaluating with some particular evaluation metrics (AUROC, AUPRC, Sensitivity, and FNR), but not others; secondly, the training dataset size is large enough (with an estimation of a magnitude of thousands).
translated by 谷歌翻译
高斯过程状态空间模型通过在转换功能上放置高斯过程来以原则方式捕获复杂的时间依赖性。这些模型具有自然的解释,作为离散的随机微分方程,但困难的长期序列的推断是困难的。快速过渡需要紧密离散化,而慢速转换需要在长副图层上备份梯度。我们提出了一种由多个组件组成的新型高斯过程状态空间架构,每个组件都培训不同的分辨率,以对不同时间尺度进行模拟效果。组合模型允许在自适应刻度上进行时间进行时间,为具有复杂动态的任意长序列提供有效推断。我们在半合成数据和发动机建模任务上基准我们的新方法。在这两个实验中,我们的方法对其最先进的替代品仅比单一时间级运行的最先进的替代品。
translated by 谷歌翻译
神经随机微分方程(NSDES)模拟随机过程作为神经网络的漂移和扩散函数。尽管已知NSDE可以进行准确的预测,但到目前为止,其不确定性定量属性仍未探索。我们报告了经验发现,即从NSDE获得良好的不确定性估计是计算上的过度估计。作为一种补救措施,我们开发了一种计算负担得起的确定性方案,该方案在动力学受NSD管辖时准确地近似过渡内核。我们的方法引入了匹配算法的二维力矩:沿着神经净层和沿时间方向水平的垂直力,这受益于有效近似的原始组合。我们对过渡内核的确定性近似适用于培训和预测。我们在多个实验中观察到,我们方法的不确定性校准质量只有在引入高计算成本后才通过蒙特卡洛采样来匹配。由于确定性培训的数值稳定性,我们的方法还提高了预测准确性。
translated by 谷歌翻译